Parsing the WSJ Using CCG and Log-Linear Models
نویسندگان
چکیده
This paper describes and evaluates log-linear parsing models for Combinatory Categorial Grammar (CCG). A parallel implementation of the L-BFGS optimisation algorithm is described, which runs on a Beowulf cluster allowing the complete Penn Treebank to be used for estimation. We also develop a new efficient parsing algorithm for CCG which maximises expected recall of dependencies. We compare models which use all CCG derivations, including nonstandard derivations, with normal-form models. The performances of the two models are comparable and the results are competitive with existing wide-coverage CCG parsers.
منابع مشابه
Log-Linear Models for Wide-Coverage CCG Parsing
This paper describes log-linear parsing models for Combinatory Categorial Grammar (CCG). Log-linear models can easily encode the long-range dependencies inherent in coordination and extraction phenomena, which CCG was designed to handle. Log-linear models have previously been applied to statistical parsing, under the assumption that all possible parses for a sentence can be enumerated. Enumerat...
متن کاملWide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models
This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are “full” parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in t...
متن کاملTowards Broad Coverage Surface Realization with CCG
This paper reports on progress towards developing the first broad coverage English surface realizer for Combinatory Categorial Grammar (CCG). The paper provides initial automatic evaluation results which are roughly comparable to those reported with other formalisms when using a (nonblind) grammar derived from the development section of the CCGbank; the results are worse, though still respectab...
متن کاملPerceptron Training for a Wide-Coverage Lexicalized-Grammar Parser
This paper investigates perceptron training for a wide-coverage CCG parser and compares the perceptron with a log-linear model. The CCG parser uses a phrase-structure parsing model and dynamic programming in the form of the Viterbi algorithm to find the highest scoring derivation. The difficulty in using the perceptron for a phrase-structure parsing model is the need for an efficient decoder. W...
متن کاملThe Importance of Supertagging for Wide-Coverage CCG Parsing
This paper describes the role of supertagging in a wide-coverage CCG parser which uses a log-linear model to select an analysis. The supertagger reduces the derivation space over which model estimation is performed, reducing the space required for discriminative training. It also dramatically increases the speed of the parser. We show that large increases in speed can be obtained by tightly int...
متن کامل